Some legal / non-technical stuff when doing scraping

I AM NOT A LAWYER. This does not constitute legal advice.

As I mentioned right at the start, scraping is slightly controversial

Legal battles have been fought, companies sued.

With great power comes great responibility! Ahem. Someone call the Cliche Police

Always follow the TOC

Avoid scraping huge companies like Google, Facebook, Twitter etc. They will ban you. The big guys will usually provide APIs. Use those.

These APIs maybe:

* Free, like Reddit
* Free but require registration, like Twitter
* Free only for limited use. Google

But still cheaper than lawsuits. Pay a few dollars to save the hassle.

If you challege a company's business directly, they will sue.

http://en.wikipedia.org/wiki/EBay_v._Bidder%27s_Edge

Even otherwise, act polite.

A robot can send a million requests a minute, and crash the server.

Don't send a million requests second. Put sleeps in, run scrapers once a day, preferebbly at night.

Ask yourself this: How would I feel if someone did this to my website and crashed it?

Finally, do you need to do scraping?

A quick 5 min Googling tells me there are many companies that will scrape for you. Will it be cheaper / easier to use them? Eg,

https://www.kimonolabs.com/

https://import.io/


In [ ]: